Необходимо нормализовать значения параметров типа numeric. Рассмотрим 2 варианта нормализации:

  1. все переменные нормализуем линейно. Либо в диапазон [0,1], либо [-1,1].
  2. Подбираем разные варианты нормализации для разных переменных. в том чиисле и нелинейную нормализацию.

Линейная нормализация

##        x8              x13                 x23              x24         
##  Min.   :0.0000   Min.   :0.0000000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:0.9590   1st Qu.:0.0004711   1st Qu.:0.1326   1st Qu.:0.01988  
##  Median :0.9638   Median :0.0024663   Median :0.1783   Median :0.02958  
##  Mean   :0.9614   Mean   :0.0055267   Mean   :0.1732   Mean   :0.03577  
##  3rd Qu.:0.9673   3rd Qu.:0.0065636   3rd Qu.:0.2145   3rd Qu.:0.04314  
##  Max.   :1.0000   Max.   :1.0000000   Max.   :1.0000   Max.   :1.00000  
##       x25                x26               x27               x28         
##  Min.   :0.000000   Min.   :0.00000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.007371   1st Qu.:0.02507   1st Qu.:0.01027   1st Qu.:0.01069  
##  Median :0.015225   Median :0.04735   Median :0.01661   Median :0.01908  
##  Mean   :0.020625   Mean   :0.06053   Mean   :0.02659   Mean   :0.02679  
##  3rd Qu.:0.026478   3rd Qu.:0.08078   3rd Qu.:0.03055   3rd Qu.:0.03359  
##  Max.   :1.000000   Max.   :1.00000   Max.   :1.00000   Max.   :1.00000  
##       x29                 x30                 x31        
##  Min.   :0.0000000   Min.   :0.0000000   Min.   :0.0000  
##  1st Qu.:0.0002009   1st Qu.:0.0000000   1st Qu.:0.1013  
##  Median :0.0011873   Median :0.0000155   Median :0.1179  
##  Mean   :0.0045176   Mean   :0.0086791   Mean   :0.1267  
##  3rd Qu.:0.0041466   3rd Qu.:0.0061817   3rd Qu.:0.1376  
##  Max.   :1.0000000   Max.   :1.0000000   Max.   :1.0000  
##       x32               x33               x34             x35        
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.000   Min.   :0.0000  
##  1st Qu.:0.00000   1st Qu.:0.07895   1st Qu.:0.000   1st Qu.:0.0000  
##  Median :0.00000   Median :0.15789   Median :0.000   Median :0.0000  
##  Mean   :0.06250   Mean   :0.16777   Mean   :0.421   Mean   :0.2668  
##  3rd Qu.:0.08333   3rd Qu.:0.23684   3rd Qu.:1.000   3rd Qu.:0.5000  
##  Max.   :1.00000   Max.   :1.00000   Max.   :1.000   Max.   :1.0000  
##       x36               x37               x38                x39         
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.000000   Min.   :0.00000  
##  1st Qu.:0.02667   1st Qu.:0.01044   1st Qu.:0.000000   1st Qu.:0.02000  
##  Median :0.04000   Median :0.01914   Median :0.000000   Median :0.04000  
##  Mean   :0.04733   Mean   :0.02839   Mean   :0.002038   Mean   :0.04891  
##  3rd Qu.:0.06667   3rd Qu.:0.03422   3rd Qu.:0.002747   3rd Qu.:0.06000  
##  Max.   :1.00000   Max.   :1.00000   Max.   :1.000000   Max.   :1.00000  
##       x40               x41              x42              x43         
##  Min.   :0.00000   Min.   :0.0000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:0.01802   1st Qu.:0.2001   1st Qu.:0.2000   1st Qu.:0.01407  
##  Median :0.02613   Median :0.3236   Median :0.3231   Median :0.03140  
##  Mean   :0.03030   Mean   :0.3317   Mean   :0.3325   Mean   :0.04057  
##  3rd Qu.:0.03694   3rd Qu.:0.4467   3rd Qu.:0.4462   3rd Qu.:0.05525  
##  Max.   :1.00000   Max.   :1.0000   Max.   :1.0000   Max.   :1.00000  
##       x44               x45              x46              x47        
##  Min.   :0.00000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.01099   1st Qu.:0.3080   1st Qu.:0.2500   1st Qu.:0.2000  
##  Median :0.02747   Median :0.5380   Median :0.5000   Median :0.5000  
##  Mean   :0.03589   Mean   :0.5404   Mean   :0.5204   Mean   :0.5044  
##  3rd Qu.:0.04945   3rd Qu.:0.8130   3rd Qu.:0.8000   3rd Qu.:0.8000  
##  Max.   :1.00000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##       x48              x49              x50              x51        
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0040   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.4680   Median :0.4540   Median :0.4610   Median :0.4570  
##  Mean   :0.4702   Mean   :0.4602   Mean   :0.4642   Mean   :0.4616  
##  3rd Qu.:0.8120   3rd Qu.:0.7940   3rd Qu.:0.8130   3rd Qu.:0.8160  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##       x52             x53              x54              x55         
##  Min.   :0.000   Min.   :0.0000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:0.000   1st Qu.:0.1847   1st Qu.:0.2817   1st Qu.:0.00000  
##  Median :0.457   Median :0.2995   Median :0.4476   Median :0.00000  
##  Mean   :0.464   Mean   :0.3146   Mean   :0.4496   Mean   :0.04274  
##  3rd Qu.:0.841   3rd Qu.:0.4344   3rd Qu.:0.6431   3rd Qu.:0.01379  
##  Max.   :1.000   Max.   :1.0000   Max.   :1.0000   Max.   :1.00000  
##       x56               x57               x58               x59         
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.04571   1st Qu.:0.04412   1st Qu.:0.01220   1st Qu.:0.00000  
##  Median :0.19318   Median :0.18182   Median :0.08621   Median :0.02273  
##  Mean   :0.28992   Mean   :0.27797   Mean   :0.19262   Mean   :0.10931  
##  3rd Qu.:0.47085   3rd Qu.:0.44473   3rd Qu.:0.27612   3rd Qu.:0.11398  
##  Max.   :1.00000   Max.   :1.00000   Max.   :1.00000   Max.   :1.00000  
##       x60               x61          
##  Min.   :0.00000   Min.   :0.000000  
##  1st Qu.:0.00000   1st Qu.:0.007325  
##  Median :0.00000   Median :0.019044  
##  Mean   :0.04726   Mean   :0.028528  
##  3rd Qu.:0.01980   3rd Qu.:0.037539  
##  Max.   :1.00000   Max.   :1.000000

Посмотрим графически на результаты нормализации на 5% данных.

выборочная нормализация

TODO:

  1. Вынести построение Scatterplot в отдельную функцию.

В результате визуального анализа графиков разброса из этапа Разведочного анализа данных (EDA) определяем следующий алгоритм нормализации: